{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Intro to ``scikit-learn``, machine learning in Python\n",
    "\n",
    "Part of our team's ongoing [Data Science Workshops](https://github.com/DrSkippy27/Data-Science-45min-Intros)\n",
    "\n",
    "2014-04-04, Josh Montague ([@jrmontag](https://twitter.com/jrmontag))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Overview\n",
    "\n",
    "This is a simple introduction to the ``sklearn`` API, with two examples using built-in datasets:  \n",
    "1. K-Nearest Neighbors (supervised)\n",
    "2. Linear regression (unsupervised)\n",
    "\n",
    "This session doesn't go into the implementations of the estimator algorithms, nor how to choose them appropriately. However, the [official docs](http://scikit-learn.org/stable/documentation.html) are incredible, and have more information than you will probably every have time to read. See also [this fantastic flowchart](http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html) on when to use which algorithm."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import some libraries for data munging and inspection:\n",
    "These aren't the ``sklearn`` library, but they're important for doing science in Python.\n",
    "If this doesn't work, you'll need to   \n",
    "``$pip install numpy``   \n",
    "and  \n",
    "``$pip install matplotlib``\n",
    "\n",
    "For this tutorial, I'm using ``numpy`` version ``1.12.1`` and ``matplotlib`` version ``2.0.2`` with Python version ``3.6.1``\n",
    "\n",
    "Later we'll import code from ``sklearn`` to do the machine learning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import random\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get & inspect the data\n",
    "\n",
    "First, we'll get some sample data from the built-in collection. Fisher's [famous](http://en.wikipedia.org/wiki/Iris_flower_data_set) iris dataset is a great place to start. The datasets are of type ``bunch``, which is dictionary-like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import the 'iris' dataset from sklearn\n",
    "from sklearn import datasets\n",
    "iris = datasets.load_iris()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use some of the keys in the ``bunch`` to poke around the data and get a sense for what we have to work with. Generally, the ``sklearn`` built-in data has ``data`` and ``target`` keys whose values (arrays) we'll use for our machine learning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"data dimensions (array): {} \\n \".format(iris.data.shape))\n",
    "print(\"bunch keys: {} \\n\".format(iris.keys()))\n",
    "print(\"feature names: {} \\n\".format(iris.feature_names))\n",
    "\n",
    "# the \"description\" is a giant text blob that will tell you more \n",
    "#    about the contents of the bunch\n",
    "print(iris.DESCR)    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Have a look at the actual data we'll be using. Note that the ``data`` array has four features for each sample (consistent with the \"feature names\" above), and the labels can only take on the values 0-2.\n",
    "\n",
    "I'm still getting familiar with slicing, indexing, etc ``numpy`` arrays, so I find it helpful to have [the docs](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html) open somewhere."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# preview 'idx' rows/cols of the data\n",
    "idx = 6\n",
    "\n",
    "print(\"example iris features: \\n {} \\n\".format(iris.data[:idx]))\n",
    "print(\"example iris labels: {} \\n\".format(iris.target[:idx]))\n",
    "print(\"all possible labels: {} \\n\".format(np.unique(iris.target)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's always a good idea to throw out some scatter plots (if the data is appropriate) to see the space our data covers. Since we have four features, we can just grab some pairs of the data and make simple scatter plots."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize = (16,4))\n",
    "\n",
    "plt.subplot(131)\n",
    "plt.scatter(iris.data[:, 0:1], iris.data[:, 1:2])\n",
    "plt.xlabel(\"sepal length (cm)\")\n",
    "plt.ylabel(\"sepal width (cm)\")\n",
    "plt.axis(\"tight\")\n",
    "\n",
    "plt.subplot(132)\n",
    "plt.scatter(iris.data[:, 1:2], iris.data[:, 2:3])\n",
    "plt.xlabel(\"sepal width (cm)\")\n",
    "plt.ylabel(\"petal length (cm)\")\n",
    "plt.axis(\"tight\")\n",
    "\n",
    "plt.subplot(133)\n",
    "plt.scatter(iris.data[:, 0:1], iris.data[:, 2:3])\n",
    "plt.xlabel(\"sepal length (cm)\")\n",
    "plt.ylabel(\"petal length (cm)\")\n",
    "plt.axis(\"tight\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And, what the heck, this is a perfectly good opportunity to try out a 3D plot, too, right? We'll also add in the ``target`` from the dataset - that is, the actual labels that we're getting to - as the coloring. Since we only have three dimensions to plot, we have to leave something out. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mpl_toolkits.mplot3d import Axes3D\n",
    "\n",
    "fig = plt.figure(figsize = (10,8))\n",
    "ax = plt.axes(projection='3d')\n",
    "ax.view_init(15, 60)   # (elev, azim) : adjust these to change viewing angle!\n",
    "\n",
    "\n",
    "x = iris.data[:, 0:1]\n",
    "y = iris.data[:, 1:2]\n",
    "z = iris.data[:, 2:3]\n",
    "\n",
    "# add the last dimension for use in e.g. the color!\n",
    "label1 = iris.data[:, 3:4]\n",
    "# can also color the data according to the actual labels \n",
    "label2 = iris.target\n",
    "\n",
    "ax.scatter(x, y, z, c=label2)\n",
    "\n",
    "plt.xlabel(\"sepal length (cm)\")\n",
    "plt.ylabel(\"sepal width (cm)\")\n",
    "ax.set_zlabel(\"petal length (cm)\")\n",
    "#plt.axis(\"tight\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since this dataset has a collection of \"ground truth\" labels (``label2`` in the previous graph), this is an example of supervised learning. We tell the algorithm the right answer a whole bunch of times, and look for it to figure out the best way to predict labels of future data samples."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Learning & predicting\n",
    "\n",
    "In ``sklearn``, model objects (implementing a particular algorithm) are called \"estimators\". Estimators implement the methods ``fit(X,y)`` and ``predict(X)``.\n",
    "\n",
    "First things first, we want to get all the sample feature (``data``) and label (``target``) arrays. Then, we'll randomly split up the data sets into training and test sets by shuffling the order and slicing off the last 'subset' samples to hold for testing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris_X = iris.data\n",
    "iris_y = iris.target\n",
    "\n",
    "r = random.randint(0,100)\n",
    "np.random.seed(r)\n",
    "idx = np.random.permutation(len(iris_X))\n",
    "\n",
    "subset = 25\n",
    "\n",
    "iris_X_train = iris_X[idx[:-subset]]  # all but the last 'subset' rows\n",
    "iris_y_train = iris_y[idx[:-subset]]\n",
    "iris_X_test  = iris_X[idx[-subset:]]  # the last 'subset' rows\n",
    "iris_y_test  = iris_y[idx[-subset:]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To see that we're choosing the training and test samples, we can again plot them to see how they're distributed. If you re-run the ``random.seed`` code, it should choose a new random collection of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize= (6,6))\n",
    "\n",
    "plt.scatter(iris_X_train[:, 0:1]\n",
    "        , iris_X_train[:, 1:2]\n",
    "        , c=\"blue\"\n",
    "        , s=30\n",
    "        , label=\"train data\"\n",
    "        )\n",
    "\n",
    "plt.scatter(iris_X_test[:, 0:1]\n",
    "        , iris_X_test[:, 1:2]\n",
    "        , c=\"red\"\n",
    "        , s=30        \n",
    "        , label=\"test data\"        \n",
    "        )\n",
    "plt.xlabel(\"sepal length (cm)\")\n",
    "plt.ylabel(\"sepal width (cm)\")\n",
    "plt.legend()\n",
    "#_ = plt.axis(\"tight\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we use the ``sklearn`` package and create a nearest-neighbor classification estimator (with all the default values). After instantiating the object, we use its ``fit()`` method and pass it the training data - both features and labels. Note that the ``__repr__()`` here tells you about the default values if you want to adjust them. Of course you can always just look at the [documentation](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier), too!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "\n",
    "knn = KNeighborsClassifier()\n",
    "\n",
    "# fit the model using the training data and labels\n",
    "knn.fit(iris_X_train, iris_y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this point, the trained ``knn`` estimator object has, internally, the \"best\" mapping from input to output. It can now be used to predict new data via the ``predict()`` method. In this case, the prediction is which class the new samples' features best fit - a simple 1D array of labels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# predict the labels for the test data, using the trained model\n",
    "iris_y_predict = knn.predict(iris_X_test)\n",
    "\n",
    "# show the results (labels)\n",
    "iris_y_predict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since this data came labeled, we can actually look at the actual *correct* answers. As this list grows in size, it's trickier to note the differences or similarities. But, still, it looks like it did a pretty good job."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris_y_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Thankfully, ``sklearn`` also has [many built-in ways](http://scikit-learn.org/stable/modules/model_evaluation.html#model-evaluation-quantifying-the-quality-of-predictions) to gauge the \"accuracy\" of our trained estimator. The simplest is just \"what fraction of our classifications did we get right?\" Clearly easier than inspecting by eye. Note: occasionally, this estimator actually reaches 100%. If you increase the \"subset\" that's cut out for testing, you're likely to see a decrease in the ``accuracy_score``, which makes it easier to visualize."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import accuracy_score\n",
    "\n",
    "accuracy_score(iris_y_test, iris_y_predict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So we know the successful prediction percentage, but we can also do a visual inspection of how the labels differ. Even for this small dataset, it can be tricky to spot the differences between the true values and predicted values; an ``accuracy_score`` in the 90% range means that only one or two samples will be incorrectly predicted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize = (12,6))\n",
    "\n",
    "plt.subplot(221)\n",
    "plt.scatter(iris_X_test[:, 0:1]     # real data\n",
    "        , iris_X_test[:, 1:2]\n",
    "        , c=iris_y_test         # real labels\n",
    "        , s=100\n",
    "        , alpha=0.6\n",
    "        )\n",
    "plt.ylabel(\"sepal width (cm)\")\n",
    "plt.title(\"real labels\")\n",
    "\n",
    "plt.subplot(223)\n",
    "plt.scatter(iris_X_test[:, 0:1]\n",
    "        , iris_X_test[:, 2:3]\n",
    "        , c=iris_y_test\n",
    "        , s=100\n",
    "        , alpha=0.6\n",
    "        )\n",
    "plt.xlabel(\"sepal length (cm)\")\n",
    "plt.ylabel(\"petal length (cm)\")\n",
    "\n",
    "\n",
    "plt.subplot(222)\n",
    "plt.scatter(iris_X_test[:, 0:1]     # real data\n",
    "        , iris_X_test[:, 1:2]\n",
    "        , c=iris_y_predict      # predicted labels\n",
    "        , s=100\n",
    "        , alpha=0.6\n",
    "        )\n",
    "plt.ylabel(\"sepal width (cm)\")\n",
    "plt.title(\"predicted labels\")\n",
    "\n",
    "plt.subplot(224)\n",
    "plt.scatter(iris_X_test[:, 0:1]\n",
    "        , iris_X_test[:, 2:3]\n",
    "        , c=iris_y_predict\n",
    "        , s=100\n",
    "        , alpha=0.6\n",
    "        )\n",
    "plt.xlabel(\"sepal length (cm)\")\n",
    "plt.ylabel(\"petal length (cm)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, if you have more ``numpy`` and ``matplotlib`` skillz than I currently have, you can also visualize the resulting model of a similar estimator like so: ([source](http://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.display import Image\n",
    "Image(filename='./iris_knn.png', width=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Once more, but with model persistence\n",
    "\n",
    "Now let's work an example of unsupervised learning.\n",
    "\n",
    "After some time with exploration like in the last example, we'll get a handle on our data, the features that will be helpful, and the general pipeline of analysis. In order to make the analysis more portable (and also when issues of scale become important), we may want to train our estimator once (as well as we possibly can, without overfitting), and then use that estimator for predictions of new data for quite a while. If fitting the estimator takes an hour (or a day!), we definitely don't want to repeat that process any more than necessary.\n",
    "\n",
    "\n",
    "Enter [``pickle``](https://docs.python.org/2/library/pickle.html#).\n",
    "\n",
    "``pickle`` is a way to serialize and de-serialize Python objects; that is, convert an in-memory structure to a byte stream that you can write to file, move around, and read back into memory as a full-fledged Python object. \n",
    "\n",
    "First, let's work with a slightly trimmed-down analysis pipeline this time. First, get some data (still built in), and have a look. This data is on house sales in Boston around the 1970s (hence the ridiculously low prices, relative to today!). The features are all kinds of interesting things, and the ``targets`` are the sale prices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# boston home sales in the 1970s\n",
    "boston = datasets.load_boston()\n",
    "\n",
    "boston.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get the full info\n",
    "print(boston.DESCR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The ``feature_names`` are less obvious in this dataset, but if you print out the ``DESCR`` above, there's a more detailed explanation of the features."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"dimensions: {}, {} \\n\".format(boston.data.shape, boston.target.shape))\n",
    "print(\"features (defs in DESCR): {} \\n\".format(boston.feature_names))\n",
    "print(\"first row: {}\\n\".format(boston.data[:1]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "While a better model would take incorporate all of these features (with appropriate normalization), let's focus on just one for the moment. Column 5 is the number of rooms in each house. Seems reasonable that this would be a decent predictor of sale price. Let's have a quick look at the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rooms = boston.data[:, 5]\n",
    "\n",
    "plt.figure(figsize = (6,6))\n",
    "plt.scatter(rooms, boston.target, alpha=0.5)\n",
    "plt.xlabel(\"room count\")\n",
    "plt.ylabel(\"cost ($1000s)\")\n",
    "plt.axis(\"tight\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, so we can work with this - there's definitely some correlation between these two variables. Let's imagine that we just knew we wanted to fit this immediately, without all the inspection. Furthermore, we wanted to build a model, fit the estimator, and then keep it around for use in an analysis later on (via ``pickle``). Let's go back a step and start from loading the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# here comes the data\n",
    "boston = datasets.load_boston()\n",
    "\n",
    "# a little goofyness to get a \"column vector\" for the estimator\n",
    "b_X_tmp = boston.data[:, np.newaxis]\n",
    "b_X = b_X_tmp[:, :, 5]\n",
    "b_y = boston.target\n",
    "\n",
    "# split it out into train/test again\n",
    "r = random.randint(0,100)\n",
    "np.random.seed(r)\n",
    "idx = np.random.permutation(len(b_X))\n",
    "\n",
    "subset = 50\n",
    "\n",
    "b_X_train = b_X[idx[:-subset]]  # all but last 'subset' rows\n",
    "b_y_train = b_y[idx[:-subset]]\n",
    "b_X_test  = b_X[idx[-subset:]]  # last 'subset' rows\n",
    "b_y_test  = b_y[idx[-subset:]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fit the estimator (and this time let's print out the fitted model parameters)..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get our estimator & fit to the training data \n",
    "from sklearn.linear_model import LinearRegression\n",
    "lr = LinearRegression()\n",
    "\n",
    "print(lr.fit(b_X_train, b_y_train))\n",
    "print(\"Coefficients: {},{} \\n\".format(lr.coef_, lr.intercept_))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And now let's imagine we spent a ton of time building this model, so we want to save it to disk:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pickle\n",
    "\n",
    "p = pickle.dumps(lr)\n",
    "print(p) # looks super useful, right?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Write the pickle to disk, and then navigate to this location in another shell and ``cat`` the file. It's pretty much what you see above. Pretty un-helpful to the eye."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# write this fitted estimator (python object) to a byte stream\n",
    "pickle.dump(lr,open('./lin-reg.pkl', 'wb'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!cat ./lin-reg.pkl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*But*, now imagine we had another process that had some new data and we wanted to use this pre-existing model to predict some results. We just read in the faile, and deserialize it. You can even check the coefficients to see that it's \"the same\" model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# now, imagine you've previously created this file and stored it off somewhere... \n",
    "new_lr = pickle.load(open('./lin-reg.pkl', 'rb'))\n",
    "    \n",
    "print(new_lr)  \n",
    "print(\"Coefficient (compare to previous output): {}, {} \\n\".format(new_lr.coef_, new_lr.intercept_))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And now we can use it to predict the ``target`` for our housing data (remember, we use the \"test\" data for measuring the success of our estimator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b_y_predict = new_lr.predict(b_X_test)\n",
    "\n",
    "#b_y_predict  # you can have a look at the result if you want"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we can have a look at how we did. Below, we can look at the best-fit line through all of the data (both \"test\" and \"train\"). Then, we also compare the predicted fit results (test data) to the actual true results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize= (12,5))\n",
    "\n",
    "plt.subplot(121)\n",
    "plt.scatter(b_X, b_y, c=\"red\")\n",
    "plt.plot(b_X, new_lr.predict(b_X), '-k')\n",
    "plt.axis('tight')\n",
    "plt.xlabel('room count')\n",
    "plt.ylabel('predicted price ($1000s)')\n",
    "plt.title(\"fit to all data\")\n",
    "\n",
    "plt.subplot(122)\n",
    "plt.scatter(b_y_test, b_y_predict, c=\"green\")\n",
    "plt.plot([0, 50], [0, 50], '--k')\n",
    "plt.axis('tight')\n",
    "plt.xlabel('true price ($1000s)')\n",
    "plt.ylabel('predicted price ($1000s)')\n",
    "plt.title(\"true- and predicted-value comparison (test data)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's not so bad! We left out all sorts of things that might help, like weighting, normalization, ..., but it's not bad for how easy it was.\n",
    "\n",
    "----\n",
    "\n",
    "So, generally, the approach to using the ``sklearn`` package is to choose an estimator, split your data, fit on the training data, predict on the testing data, and then celebrate how easy ``scikit-learn`` has made this process:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "    from sklearn import estimator\n",
    "\n",
    "    e = Estimator()\n",
    "    e.fit(train_samples [, train_labels])\n",
    "    e.predict(test_samples [, test_labels])\n",
    "    \n",
    "    rejoice()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
